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1 Distribution Overview 
1.1 Discrete Distributions 



Notation 1 



F x (x) 



fx(x) 



E [X] 



Y[X] 



M x (s) 



Uniform 
Bernoulli 

Binomial 

Multinomial 




Unit {a, ...,&} 

Bern (p) (1 — p) 1 

Bin(n,p) /i_ p (n — x, x + 1) 



Mult (n,p) 
Hypergeometric Hyp (N, m, n) 



*(- 



x — np 



Negative Binomial NBin (r, p) 
Geometric 

Poisson 



\yjnp{l-p) 
I p (r,x + 1) 

Geo(p) l-(l-p) x i£N 

Po(A) 



-x sr^ A 



x 1 \ . . . x k 
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1 We use the notation f(s,x) and T(x) to refer to the Gamma functions (see §22.1), and use 3(x,y) and I x to refer to the Beta functions (see §22.2). 



1.2 Continuous Distributions 



Notation 



Fx(x) 



fx(x) 



E [X] 



V[X] 



M x {s) 



Uniform 



Normal 



Log-Normal 



Unit (a, b) 



x < a 
§f£ a<x<b 

1 x > 6 

/X 
<t>{t) dt 
- CO 

+ ^erf 



1 + Ierffe^ 

2 2 [ ^2^2 



7(a < a; < fe) 
6 — a 



4>{x)= 1 exp (_.^-^ 



ay/2-K 



2a 2 



1 f (lnx-^) 2 

exp < — 



2a 2 



Multivariate Normal MVN (/i, E) 



(27r) _fe/2 lSp 1/2 e _ 5 (a:_M)Ts_1(:c_fl) 



Student's t 
Chi-square 



Exponential 
Gamma 
Inverse Gamma 

Dirichlet 
Beta 

Weibull 
Pareto 



x 2 



F(di,d 2 ) 
Exp (/3) 
Gamma (q, f3) 
InvGamma (a, ft) 



k x 
T(fc/2) ' V2' 2 



f di <h 

1 d l x 1 o ' o 



1 - e 



I» 

T(a) 



Dir (a) 

Beta (a, /3) h{a,P) 
Weibull(A, k) l- e - (x/x)k 

Pareto(a; m , a) 1 — ^i^IIL^ x > 



2 X -(i>+l)/2 



(I) 
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2 fc / 2 r(fc/2) 



x k/2-l e -*/2 



(d 1 x + d 2 ) d l+ d 2 

*B(f,f) 



T (a) 



a-1 -a;//3 
X e ' 



/3" -a-1 -P/x 

r(a + /3) ^-i^^-i 



i»r(/9) 



„a+l 



OC OC 7) 
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2 M+ CT 2 / 2 
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d 2 
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a- 1 



a > 1 
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a > 2 



(a - l) 2 (a-2) 2 

E[Xi](l-E[Xj]) 
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(a-l) 2 (a-2) 



s(6 — a) 
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exp < /is H 



I T . 1 

exp <i (i s + -s Es 



(l-2s)- fc/2 s < 1/2 



■ (s < 1/0) 
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2 Probability Theory 

Definitions 

• Sample space Q 

• Outcome (point or element) co e il 

• Event Acn 

• er-algcbra A 

1. <DeA 

2. A 1 ,A 2 ,...,eA => USi^e- 4 

3. ie^l -^Ae A 

• Probability Distribution P 

1. P [A] > VA 

2. P [O] = 1 



3. 



i=i 

• Probability space (il, A, P) 
Properties 

• P [0] = 

• b = nnB = (Au -iA) n b = (A n s) u (-.4 n s) 

• P [-iA] = l - p [A] 

• p [B] = p [A n s] + p [-.A n s] 

• P [Ct] = 1 P [0] = 

• -'(Un^) = n„-'A. -(n„^n)=U„-^n DeMORGAN 

• P[U n AJ = l-P[Tl„^n] 

• P[iUB]=P[A]+P[B]-P[inB] 
=> P[AuB]<P[i]+P [B] 

• p[4uB]=P[AnnB]+p[ninB]+P[in5] 

• P [A n nB] = P [A] - P [A n B] 

Continuity of Probabilities 

• A 1 cA 2 C... => limn^ P [A n ] = P [A] whereA = (J^ 4 

• iiDi 2 D... => lim„^ co P [A„] = P [A] whcreA = f|~! ^4 

Independence _LL 

A ALB P [A n B] = P [A] P [5] 
Conditional Probability 



Law of Total Probability 

n n 

F[B} = J2v[B\A i ]F[A i ] fl=\_\A i 



i=l 



B ayes' Theorem 



[4 I B] = 



P[j?|^]PLA t ] 

E-=iP[s|^]p[^] 



0= \_\A 



Inclusion-Exclusion Principle 



i=l r=l i<ii < - <i r <n j=l 



3 Random Variables 

Random Variable (RV) 

X : n -> R 

Probability Mass Function (PMF) 

fx(x) = P [X = x] = P [{oj e n : = jc}] 

Probability Density Function (PDF) 

P [a < X < b] = f f(x) dx 

J a 

Cumulative Distribution Function (CDF) 

F x :l->[0,1] F x (x) =¥[X < x] 

1. Nondccrcasing: x\ < x 2 F(xi) < F(x 2 ) 

2. Normalized: lim^^-co = and lim. E _ ; . 0O = 1 

3. Right- Continuous: lim,^ F(y) = F{x) 



F[a<Y <b\X = x\ -- 

fv\x(y\x) 



fy\x(y\x)dy 

f(x,y) 

' fx(x) 



a < 



Independence 



1. P [X < x, Y < y] = P [X < x] P [Y < y] 
2- fx.y{x,y) = fx(x)f Y (y) 



3.1 Transformations 

Transformation function 

Discrete 



Z = <p(X) 



f z (z) = P \<p{X) = z] = P \{x : <p{x) = z}} = P [X e if-Hz)} = E f(x) 



Continuous 

F z (z)=¥[p(X) < z] 

Special case if ip strictly monotone 

d 

dz 



f(x) dx with A z = {x : ip(x) < z} 



= fx(x) 



f Z (z) = f x (<p- 1 (z)) 
The Rule of the Lazy Statistician 

E[Z] = J <p(x)dF x (x) 



dx 
dz 



fx(x) 



E [I A (x)} = / I A (x) dF x {x) = dF x (x) =¥[XeA] 



Convolution 



f°° xy>q f z 

• Z:=X + Y fz(z)= / f x .y(x 1 z-x)dx '=~ / fx,y(x, z - x) dx 

J-oo JO 
/>oo 

• Z:=\X-Y\ fz{z) = 2 f x , Y (x,z + x)dx 

Jo 

• Z := — fz(z) = / \x\f x ,y{x,xz)dx = / xf x (x)f x (x)f Y (xz)dx 



4 Expectation 

Definition and properties 

• E [X] = nx = I xdF x (x) = < 



xf x (x) X discrete 

X 

J xf x (x) X continuous 



• P [X = c) = 1 => E [c] = c 

• E [cX] = cE [X] 

• E [X + Y] = E [X] + E [Y] 



. E [17] - / xyf xx (x, y) dF x (x) dF Y (y) 

JX,Y 

• E [<p(Y)] ^ ip(E [X]) (cf. Jensen inequality) 

• P [X > Y] = => E [X] > E [Y] A P [X = Y] = 1 



E [X] = E [y] 



E [X] = p i x > x \ 



Sample mean 



Conditional expectation 



1 " 

x„ = — y Xi 



i=l 



E [Y | X = a] = J yf(y \ x) dy 
ELY] = E[E[X|Y]] 

p oo 

£[^(X, Y)\X = x]= ip(x, y)f Y]x (y | x) ^ 



E [v?(Y, Z) | X = x] 



<f(v, z )f(Y.Z)\x(y, z\x)dy dz 



E [Y + Z | X] = E [Y | X] + E [Z \ X] 
E[^(X)F|X]=^(X)E [Y|X] 
E[Y | X] = c Cov [X, F] = 



5 Variance 

Definition and properties 

• Y [X] = a x = E [(X - E [X]) 2 ] = E [X 2 ] - E X] 2 



.i=l 
n 



^V[X]+2^Cov[X i ,y i ] 



= ^V[Xi] if X t ALX 3 



Standard deviation 



Covariance 



sdX] = y/V[X] 



ox 



• Cov [x 7 y] = e [(x — e [x])(y - e [y])] = e [xy] - e X] e [y] 

• Cov [X, a] = 

• Cov [X, X] = V [X] 

• Cov [X, Y] = Cov [y, X] 

• Cov [aX, bY] = abCov [X, Y] 



• Cov [X + a, Y + b] = Cov [X, Y] 

• Cov 



n m 



i=i i=i 



Cov 
i=i j=i 



Correlation 

,[XH = Cov[x ' y] 



v/v[x]v[y] 

Independence 

X _U_ y => p[X,Y]=0 Cov [X, Y] = E [XY] =E[X]E [Y] 
Sample variance 

i— 1 

Conditional variance 

• v [y | X] = E [(y - E [Y | X]) 2 | X] = E [Y 2 1 X] - E [y | X] 2 

• v [y] = e [v [y I x]] + v [e [y I x}} 



6 Inequalities 

Cauchy-Schwarz 

Markov 
Chebyshev 



E [XY] 2 < E [X 2 ] E [y 2 ] 
E [^(X)] 



Chernoff 



Jensen 



[<fi{X) >t]< 



[|X-E[X]| >t]< 



t 

V[X] 
t 2 



[x>{i + S)A<[ 1+s ) s i 



E [ip(X)} > Lp(E [X]) if convex 



7 Distribution Relationships 

Binomial 

n 

• Xi <~ Bern (p) =>■ X< <~ Bin (n,p) 

• X ~ Bin(n,p) ,Y ~ Bin(m,p) => X + y - Bin (n + m,p) 



• lim„_j. 00 Bin (n,p) = Po (np) (n large, p small) 

• limn^co Bin (n,p) = Af (np, np(l — p)) (n large, p far from and 1) 
Negative Binomial 

• X ~ NBin(l,p) = Gco(p) 

. X~NBin(r,p) = E[=iGeo(p) 

. X i ~NBin(r i ,p) => E** ~ NBin(X)r<,p) 

• X - NBin (r,p) . Y ~ Bin (s + r,p) => P [X < s] = P [Y > r] 

Poisson 



• Xi ~ Po (Aj) A Xi AL Xj => ^ X ~ Po ^ A, 



£X; -Bin E^^A 



■=1 ^3 



• Xi - Po (Xi) A Xi _U_ Xj => X 
Exponential 

n 

• X ~ Exp (P) AXi AL Xj => £ X - Gamma (n, /3) 

i=l 

• Memoryless property: P [X > x + y \ X > y] = P [X > x] 
Normal 

. X~X( M ,<7 2 ) => (_^)~jV(0,l) 

• X - X" (/J, cr 2 ) A Z = aX + b => Z - X (fl/x + b, a 2 a 2 ) 

• X-X(/Ji,a 2 ) AY~M(n2,<%) => X + Y ~ X (mi + A*2, <r? + <r|) 
. Xi~^(/Xi,<7 2 ) => Ei^i~^(EiW,Ei^) 

• P [a < X < 6] = $ (-^) - $ 

• S(-jc) = 1 - ^(sc) = -a:0(x) = (a; 2 - l)^(x) 

• Upper quantile of X(0, 1): z a = - a) 
Gamma 

• 1 ~ Gamma (a, /^) •<=>■ X//3 ~ Gamma (a, 1) 

• Gamma (a, /?) <~ EiLi Exp (/3) 

• Xi ~ Gamma (cti, (3) A Xi AL Xj => Ei ^ ~ Gamma (Ei on, /3) 
Beta 

B(M) ( } "W) 1 U Xj 

' E LX J - B(a,/3) ~ a + /3 + fc-l E LX J 

• Beta (1,1) - Unif(0,l) 



8 Probability and Moment Generating Functions Conditional mean and variance 
G x (t)=E[t x ] \t\<l 
M x (t) = G x (e t )=E[e xt ] = E 



^ (xty 



,i=0 



E[X\Y]=E[X]+ p—(Y - E [Y]) 



i=0 



¥[X = Q}= G x (0) 
¥[X = 1]=G' X (0) 



F[X = i] 



G x \0) 



E[X] = G' x (l-) 
X k ] = M<£\q 

xl ] = r S k 

_(X-k)l_ 
V[X] = G' x (l-) + G' x (l-)- 
G x (t) = G Y (t) => X±Y 



Gil'(l-) 



(G' x (i-)Y 



9 Multivariate Distributions 

9.1 Standard Bivariate Normal 

Let X,Y ~ 7V(0, 1) A X _U_ Z where y = + ^1 - p 2 Z 

Joint density 

1 f x 2 + y 2 - 2pxy \ 

f( x >y) = z 7z ^ exp <{ - - 



2(1 -P 2 ) J 



Conditionals 

(y|X = x)^AA( pa; ,l-p 2 ) and (X|y = y)~AA( P? y,l-p 2 ) 
Independence 

x_u_y p = o 



9.2 Bivariate Normal 

Let X — TV (/Us, cr 2 ) and Y ~ Af (p y , a 2 ) . 

1 



f(x,y) = 



2ira x (Ty^l - p 2 P I 2( 



'• - i>., \ 2 + ( y - Hy y _ 2p ( ' >'■• \ I " " /? ' 



0\, 



(7 



v[x|y] = a xv / i-p 2 

9.3 Multivariate Normal 

Covariance matrix S (Precision matrix X -1 ) 

V[*i] ••• Cov[Xi,X fc ]\ 

v Cov[X fc ,Xi] ••• V[X fc ] / 
If X~ M{n,T,), 

f x (x) = (2tt)-"/ 2 lEf 1 / 2 exp \~\{x- M ) T S- 1 (x - M )| 

Properties 

• Z-7V(0,1)AX = ^ + E 1 / 2 Z X-AA( M ,S) 

• X~JV(/i,S) => E _1 / 2 (X — /x) ~ TV (0, 1) 

• X~ TV => AX <~ A/" (A/x, AXM T ) 

• X - TV (/i, S) A || cx || = fc => a T X ~ TV (a T ^, a T Sa) 

10 Convergence 

Let {X\,X2, ■ ■ ■} be a sequence of rv's and let X be another rv. Let F n denote 
the CDF of X n and let F denote the CDF of X. 

Types of convergence 

1. In distribution (weakly, in law): X n A X 

lim F n (t) = F(t) Mt where F continuous 

2. In probability: X n A X 

(Ve > 0) lim P [\X n - X\ > e] = 



3. Almost surely (strongly): l n —> X 



lim X„ = X 



lo £(] : lim X n (w) = X(w) 



4. In quadratic mean (£2): X n A X 

lim E \{X n - X) 2 ] = 

71— >00 

Relationships 

• X n — y X - y X n — y X ■ y X n — y X 

• I„4lA(3ceR)P[I = c] = l i„Ai 

. x n 4 x a Y n 4 y => x„ + y n A x + y 

• X n ^ X AY n ^Y =^ X n +Y n ^ X + Y 

.i„4iAy„4y=^ x„y„ A xy 

• X„ 4 & lim^^ E pf„] = 6 A lim^co V [X„] = 

• X 1 , . . . ,X n IID A E [X] = n A V [X] < 00 <^=> X n ^> n 

Slutzky's Theorem 

• I„4l and F„4c => -^n + *n 4 X + c 

• I„4l and y„ 4 c => x„y„ 4 cX 

• In general: X n 4 X and y„ 4 K =fr X n + Y n 4 X + Y 

10.1 Law of Large Numbers (LLN) 

Let {X\, . . . , X n } be a sequence of IID Rv's, E [Xi] = fi. 

Weak (WLLN) 

X n 4 n — > 00 

Strong (SLLN) 

X„ 4 /u n — > 00 

10.2 Central Limit Theorem (CLT) 

Let {Xi, . . . , X n } be a sequence of iid rv's, E [Xi] = \i, and V [Xi] = a 2 . 



Z n := -^ZJL = ^ n & 4 Z where Z~ JV(0,1) 
lim P [Z„ < 0] = $0) zel 



CLT notations 



X„ 

X n -[i^Af(0, 



AA(0,1) 



V^(X„-M)«^(0,a 2 ) 
fel ^(0,l) 



Continuity correction 



P [X n < x] « $ 

[X„>i]ri1-$ 



1 

2 



Delta method 



11 Statistical Inference 

Let 1 , • • • , X„ ~ F if not otherwise noted. 

11.1 Point Estimation 

• Point estimator 9 n of 9 is a rv: 0„ = g{X\, . . . , X n ) 

• bias(6>„) = E 6 n -6 

• Consistency: 9 n 4 

• Sampling distribution: F(9 n ) 



Standard error: se(0 n ) = ./V 













= E 


"(0 n - #) 2 ~ 


= bias(#„) 2 +V 





lim n ^oo bias(#„) = A limbec se(9 n ) = 0„ is consistent 



Asymptotic normality: 



se 



4^(0,1) 



• Slutzky's Theorem often lets us replace se(6>„) by some (weakly) consis- 
tent estimator a n . 

10 



11.2 Normal-Based Confidence Interval 

Suppose n w Af($,se 2 )- Let z a/2 = ^(l - (a/2)), i.e., P [Z > z a/2 ] 
and P [-z a/2 < Z < z a/2 ] = l-a where Z ~ TV (0, 1). Then 

C n = n ± z Q / 2 se 



11.3 Empirical distribution 

Empirical Distribution Function (ECDF) 

Er=iW<^) 



F n { X ) 



I[Xi < x) = 



n 

' 1 X; < X 
X; > X 



Properties (for any fixed a;) 



E 



Fn 



= F(x) 

F(x)(l-F(x)) 



n 



F(x)(l-F(x)) D 

• MSE = —^-5: -> 

n 

. F„ A F(x) 

Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (X l7 ...,X n ~F) 



sup F(x) - F n (x) > e = 2e 

Nonparamctric 1 — a confidence band for F 

L(x) = max{F„ - e n , 0} 
C/(x) = min{F„ + £„,!} 







/i log 


(!) 







P [L{x) < F(x) < U(x) Vx] > 1 - a 



11.4 Statistical Functionals 

a /2 • Statistical functional: T(F) 

• Plug-in estimator of = (F): n = T(F n ) 

• Linear functional: T(F) = J ip(x) dFx(x) 

• Plug-in estimator for linear functional: 



T(F n ) = J cp(x)dF n (x) = 



. Often: T(F n ) « N (t(F), se 2 ) => T(F„) ± z q/2 

• p th quantile: -F _1 (p) = inf{x : F(x) > p} 

• fi = X n 

i ™ 



l 
se 



• K 



a 3 j 



m p— E"=i(^ — X„)(li — y„ 



VE?=i(^-^n)yE?=i(^-K) 

12 Parametric Inference 

Let # = {f(x;0) : e 9} be a parametric model with parameter space 9 C 
and parameter — (0i, ... ,0k). 

12.1 Method of Moments 

j th moment 

otj(0)=E [X 1 ] = j x 3 dF x (x) 

j th sample moment 

1 " 



Method of moments estimator (MoM) 



ai(0) = Si 
a 2 (0) = a 2 



a k (0) = a k 

11 



Properties of the MoM estimator 

• 9 n exists with probability tending to 1 

• Consistency: 9 n A 9 

• Asymptotic normality: 

y/n(fi-0) 4yV(0,£) 

where E = gE [YY T ] g T , Y = (X, A 2 , . . . , X k ) T , 
9 = (gi,.. -,9k) and g 3 = ^ s aJ 1 (9) 

12.2 Maximum Likelihood 

Likelihood: £ n :0-> [0,oo) 



Log-likelihood 



i=l 



4(^)=log£„(^)=^log/(X i ;^) 
i=i 

Maximum likelihood estimator (mle) 

£«(#«) = sup£„(6>) 

9 

Score function 

S (X;0) = ^log/(X;0) 

Fisher information 

1(6)= Y e [s(X-O)] 
I n (6) = nl(6) 
Fisher information (exponential family) 

d 



1(6) = Eg 
Observed Fisher information 



de s(X;6) 



pp. « 

C s (e) = - 7 ^ 2 ^gf(x t -e) 



d9 2 

i=i 



Properties of the MLE 
• Consistency: 9 n A 9 



Equivariancc: 9 n is the mle => ip(9 n ) ist the MLE of ip(9) 
Asymptotic normality: 

1. se « ^l/IJ9) 



(6 n — 6) d 



se 



2. se^Jl/I n (e n ) 



(6 n — 6) d 



se 



TV (0,1) 



TV (0,1) 



Asymptotic optimality (or efficiency), i.e., smallest variance for large sam- 
ples. If 9 n is any other estimator, the asymptotic relative efficiency is 



ARE(0 n ,0„) 



• Approximately the Bayes estimator 
12.2.1 Delta Method 

If t = ip(8) where ip is differcntiable and ip'(9) ^ 0: 

(%, ~ t ) d 
se(f) 

where r = if (6) is the mle of r and 



< 1 



TV (0,1) 



se = 



</(0) se(6 n ) 



12.3 Multiparameter Models 

Let 6 = (9i, . . . ,6k) and 6 = (6i, . . . , 6k) be the mle. 

(9 2 / d 2 f 

Fisher information matrix 



89,89 k 



I n (0) 



'E e [H n ] 



E e [H kl ] ■ 
Under appropriate regularity conditions 

(0-0)«JV(O,J n ) 



Ee [H lk ] 
Ee [Hkk] 
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with J n (9) = I n 1 . Further, if 9j is the j th component of 9, then 



ViZ 9 jl AAA (0,1) 



where se^- = J n (j,j) and Cov 



7j , Vk 



Jn(j,k) 



12.3.1 Multiparameter delta method 



Let r = (p(9i, . . . , 9k) and let the gradient of <p be 



d9 1 

dip 
\d9~J 



Suppose VvL_2 7^ and f = <p(0). Then, 



(r-r) o 
se(f) 



TV (0,1) 



where 



se(r) = J(V^) J„(V^) 



and J„ = J„(0) and Vi^s = V</?L g-. 



12.4 Parametric Bootstrap 

Sample from /(a;; 6*„) instead of from F, 
of moments estimator. 



where 9 n could be the mle or method 



13 Hypothesis Testing 



H : 9 e 6 versus H t : 9 e ©i 

Definitions 

• Null hypothesis 

• Alternative hypothesis i?i 

• Simple hypothesis 9 = 9q 

• Composite hypothesis 9 > 9o or 9 < 9 

• Two-sided test: H : 9 = 9 versus Hi : 9 ^ 9 

• One-sided test: H : 9 < 9 versus Hi : 9 > 9 

• Critical value c 

• Test statistic T 

• Rejection region R — {x : T(x) > c} 

• Power function /3(6) = P [X e R] 

• Power of a test: 1 - P [Type II error] = 1 - /? = ^inf /3(9) 

• Test size: a = P [Type I error] = sup fi{9) 





Retain H 


Reject H 


H a true 


V 


Type I Error (a) 


Hi true 


Type II Error (/3) 


V (power) 



p- value 

• p-valuc = sup eeeo Vg [T(X) > T(x)] = inf{a : T(x) e R a ) 

• p-valuc = sup ee6o Pg [T(X* ) > T{X)} = inf {a : T(X) e 

l-F e {T{X)) since T(X*)~F e 

p-value evidence 

< 0.01 very strong evidence against H 

0.01 — 0.05 strong evidence against H 
0.05 — 0.1 weak evidence against H n 
> 0.1 little or no evidence against H 

Wald test 

• Two-sided test 

() — 6 

• Reject H when \W\ > z a / 2 where W = 



se 



. P [\W\ > z a/2 \ -> a 



• p-valuc = V 0a [\W\ > \w\] « P [|Z| > |tw|] = 2$(-H) 
Likelihood ratio test (LRT) 

swp ee0 C n (9) _ C n (9 n ) 



T{X) = 



sup ee0o £„(#) C n (6 nfi ) 



X(X) = 2 log T(X) 4 x 2 r - q where j^Zf ~ £ and Z lt ... ,Z k ~ N (0, 

i=l 

p-valuc = P e „ [A(X) > \{x)] « P [x?_, > A(a;)] 



Multinomial LRT 



mle: p„ 



n n 



X(X) = 2^^108 ff) 4 xt, 

j = l ' 



• The approximate size a LRT rejects 77 when A(X) > xt-i. 
Pearson Chi-square Test 

k (X — E \X X) 2 

' T = Y1 \\x ] where E ^ = npa ° undcr H ° 



_ : E [X,] 
p-value = P [xLi > T(x)] 

Faster A -X'^_ 1 than LRT, hence preferable for small n 



Independence testing 

• I rows, J columns, X multinomial sample of size n = I * J 

• MLEs unconstrained: = —^ L 

• MLEs under 77 : p oij = Pi-P-j = 

. LRT: A = 2£[ =1 £/ =1 ^ log ) 

. PearsonChiSq: T = £f =1 E/ = i %g" 1)2 

• LRT and Pearson 4 x|z/, where i/ = (7 — 1)(J — 1) 



14 Bayesian Inference 

B ayes' Theorem 

1 * } - /m ~ jf( x \e)f(B)dB a £ " ( W) 

Definitions 

• X" = (X!,...,X„) 



• Prior density /(#) 

• Likelihood | 0): joint density of the data 

n 

In particular, X n iid => f(x n | 6») = ]~J f(x, t \ 6) = C n (9) 

»=i 

• Posterior density f(6\x n ) 

• Normalizing constant c„ = f(x n ) — J f(x \ 0)f(8) d9 

• Kernel: part of a density that depends on 9 

• Posterior mean n = J Of (6 \x n )d9 — j^^Jff^Jf^g 



14.1 Credible Intervals 

Posterior interval 

P [0 G (a, b) \x n ]= f ,f(9 | x n ) d6 = l-a 

J a 

Equal-tail credible interval 

/a poo 
f(9\x n )d9= / f(0 1 x n ) d6 = a/2 
-oo J b 

Highest posterior density (HPD) region R n 

1. P [9 G i?„] = 1 - a 

2. i?„ = {6» : /(6> | x") > fc} for some k 

R n is unimodal i?„ is an interval 

14.2 Function of parameters 

Let t = <p(0) and A = {9 : ip(6) < r}. 
Posterior CDF for r 

77(r | a:") = P [<p{6) <r\x n ]= J f(9 \ x n ) dt 

Posterior density 

h(r\x n ) = H'{T\x n ) 

Bayesian delta method 



\X n *Af(v(e),se\<p'(0)\) 



14.3 Priors 



Choice 

• Subjective bayesianism. 

• Objective bayesianism. 

• Robust bayesianism. 

Types 

• Flat: f{9) oc constant 

• Proper: f{6) d9 = 1 

• Improper: f{9) d9 = oo 

• Jeffrey's prior (transformation- invariant): 

f(9) ex yfl{f) /(*) « Vdct(/(0)) 

• Conjugate: and /(6> | x n ) belong to the same parametric family 



14.3.1 Conjugate Priors 



Discrete likelihood 


Likelihood 


Conjugate prior 


Posterior hyperparameters 


Bern (p) 


Beta (a, (3) 


n n 

a + Xj, (3 + n — x t 
i=i i=i 


Bin 0) 


Beta (a, (3) 


n n n 

i—1 i—1 i—1 

n 


NBin 0) 


Beta (a, (3) 


a + rn, /3 + 


Po(A) 


Gamma (a, /3) 


n 

CI + ^ Xiifi + 71 
i=l 


Multinomial (p) 


Dir (a) 


n 

i=i 


Geo (p) 


Beta (a, /3) 


n 
i=l 



Continuous likelihood (subscript c denotes constant) 


Likelihood 


Conjugate prior 


Posterior hyperparameters 


Unif(O,0) 


Pareto(x m , k) 


max {:£(„), x m } ,k + n 

n 


Exp (A) 


Gamma (a, (3) 


a + n, (3 + Xj 

i=l 


A^ c ,a 2 ) 


Scaled Inverse Chi- 


/ 1 ^r 1 

^ +n / CT o+Er=i(^-M) 2 

1/ + n 




square(i/, Og) 


A^,a 2 ) 


Normal- 
scaled Inverse 
Gamma(A, u, a, (3) 


+ to n 
i/ + n, a + 
v + n 2 

1 v-v >o 7(5 — A) 2 

i=l v 


MVN(/x, S c ) 


MVN( Mo ,£o) 


(Sp 1 + nS- 1 )" 1 (SqVo + n^x), 

/\-\— 1 , T^— 1\ _ 1 

(V+» S c ) 


MVN( Mc , E) 


Inverse- 
Wishart(K, \t) 


n 

n + k, * + 2( x « - Mc)(^ - Mc) T 

i=l 
n 


Pareto(x me , fc) 


Gamma (a, (3) 


a + n, (3 + log — — 


Pareto(x m , fc c ) 


Pareto(xo, fco) 


%o,ko — kn where k > kn 

n 


Gamma (a c , /3) 


Gamma (a>o, (3 ) 


a Q + na c ,f3 + y^ Xj 
i=l 



14.4 Bayesian Testing 

EH :9e 6 : 

Prior probability P [H ] = [ f{6) d6 

Posterior probability P [H \x n ]= [ f(0 \ x n ) d6 

Let H 0} . . . , H K _i be K hypotheses. Suppose 9 ~ f(9 \ H k ), 



Marginal likelihood 

f(x n \H i )= [ f(x n \e,H i )f(e\H i )de 

Posterior odds (of Hi relative to Hj) 



F[Hi\x n ] _ f(x n \H t ) x ¥[Hi 



Bayes factor 



Baycs Factor BFij prior odds 

log 10 BF W BF 10 evidence 

0- 0.5 1 - 1.5 Weak 
0.5 - 1 1.5 - 10 Moderate 

1- 2 10-100 Strong 
> 2 > 100 Decisive 

P* = 1 p PF where P = P I^i] and P* = P I x "l 

15 Exponential Family 

Scalar parameter 

f x (x | 9) = h(x) cxp { V (8)T(x) - 

= /i(x) 5 (0)exp {77(0)T(a;)} 

Vector parameter 

fx(x | 0) = h(x) cxp ^J2 Vt (9)T t (x) - ,4(0) j 

= /i(a;) cxp {r/(6») • T(x) - 4(0)} 
= /i(aOs(0)exp {»,(*) -T^)} 



Natural form 



fx{x\v) =h(x)exp{ri-T(x)-A(ri)} 
= h(x)g(r])exp{r) ■ T(a;)} 
= /i(a;) 5 (?7)exp{r; T T(x)} 



16 Sampling Methods 

16.1 The Bootstrap 

Let T n = g(X\, . . . , X n ) be a statistic. 



1. Estimate V F [T„] with [T„]. 

2. Approximate [T„] using simulation: 

(a) Repeat the following B times to get T* x , . . . , T* B , an iid sample from 
the sampling distribution implied by F n 

i. Sample uniformly X*, . . . , X* ~ F n . 

ii. Compute T* = g{X{, X*). 

(b) Then 



Vboot = Vp n = ^ 2 ( ~ 51 T ™,r- ) 

6=1 \ r=l / 



16.1.1 Bootstrap Confidence Intervals 

Normal-based interval 

T n ± z a / 2 seboot 

Pivotal interval 

1. Location parameter 9 — T(F) 

2. Pivot R n =6 n -6 

3. Let H(r)=W> [R n < r] be the CDF of R n 

4. Let R* nb — 9^ b — 9 n . Approximate H using bootstrap: 

H{r) = ^I{R* nfi <r) 

6=1 

5. 9*p = fi sample quantile of (0* l7 . . . , 

6. r£ = /3 sample quantile of (-R* 1: . . . , i.e., r* = 8J 3 - 9 n 

7. Approximate 1 — a confidence interval C' n = (a, bj where 

a = 9 n - H^ 1 (l - - ) = n - r*_ Q/2 = 29 n - 0* 1 _ a/2 
b= 9 n -H- 1 ^)= e n -r* a/2 = 29 n -9* a/2 

Percentile interval 

16 



16.2 Rejection Sampling 

Setup 

• We can easily sample from g{9) 

• We want to sample from h{6), but it is difficult 

k{9) 



• We know h(6) up to a proportional constant: h{9) — ,. 

J k[9) dO 

• Envelope condition: we can find M > such that k{9) < Mg(9) 
Algorithm 

1. Draw 9 cand ~ g{9) 

2. Generate u ~ Unif (0, 1) 

3. Accept 9 cand if u < — ^ — ^ 

~ Mg(9 cand ) 

4. Repeat until B values of 9 cand have been accepted 
Example 

• We can easily sample from the prior g(9) = f(9) 

• Target is the posterior h{9) oc k{9) = f(x n \ 9)f(9) 

• Envelope condition: f(x n \ 9) < f(x n \ 9 n ) = C n (9 n ) = M 

• Algorithm 

1. Draw 9 cand - f(9) 

2. Generate u ~ Unif (0, 1) 
r 

cand :r . . ^ n 



3. Accept 9 cand if u < 



16.3 Importance Sampling 

Sample from an importance function g rather than target density h. 
Algorithm to obtain an approximation to E [q{9) \ x n ]: 



1. Sample from the prior 6\, . . . , 6 n ~ f(9) 

2. Wi = f n{di) Vi = l,...,B 

3. E [g((9) | a;"] w Ef=i 



17 Decision Theory 

Definitions 

• Unknown quantity affecting our decision: 9 G 6 



• Decision rule: synonymous for an estimator 9 

• Action a A: possible value of the decision rule. In the estimation 
context, the action is just an estimate of 9, 9(x). 

• Loss function L: consequences of taking action a when true state is 9 or 
discrepancy between 9 and 9, L : Q x [— k,oo). 

Loss functions 

• Squared error loss: L(9, a) = (9 — a) 2 



• Linear loss: L{9, a) 



K^e-a) a-9<0 



K 2 (a-9) a-9>0 

• Absolute error loss: L(9, a) = \6 — a\ (linear loss with K\ = K 2 ) 

• L p loss: L(9,a) = \6 - a\ p 



Zero-one loss: L(9, a) 



a = 9 

1 a^9 



17.1 Risk 

Posterior risk 



= J L{6, \ 



r(9\x)= I L(6,6(x))f(6\x)d6 = E elx L(6,6(x)) 
(Frequentist) risk 

R{6,6)= J L(6,e(x))f(x\6)dx = E xle [l(9,9(X)) 



Bayes risk 



-(/, 0) = JJ L(6, 6(x))f(x, 9) dx d9 = E e , x [l(9, 9{X)) 



r(f,0) = E e 
r(f,6)=E x 

17.2 Admissibility 

• 9' dominates 9 if 



E 



x\e 



E 



e\x 



L(6,e(X) 
L(9,9(X) 



= E g 
= E 



x 



R(9,9) 
r{9\X) 



^6 : R(9,9 r ) < R(9,9) 
39 : R(9, 9') < R(9, 9) 



• 9 is inadmissible if there is at least one other estimator 9' that dominates 
it. Otherwise it is called admissible. 
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17.3 Bayes Rule 

Bayes rule (or Bayes estimator) 

. r(/,0)=in%r(/,0) 

• 6{x) = mir{9\x) \Jx r(f,6) = J r{9\x)f{x) dx 
Theorems 

• Squared error loss: posterior mean 

• Absolute error loss: posterior median 

• Zero-one loss: posterior mode 

17.4 Minimax Rules 

Maximum risk 

R(6) = sup R(6, 9) R(a) = sup R(6, a) 

9 9 

Minimax rule 

sup R(6, 6) = inf R{6) = inf sup R{9, 9) 

9 9 9 9 

9 = Bayes rule A 3c : R{9, 9) = c 

Least favorable prior 

P = Bayes rule A R(9, P) < r(f, P) V6> 

18 Linear Regression 

Definitions 

• Response variable Y 

• Covariate X (aka predictor variable or feature) 

18.1 Simple Linear Regression 

Model 

Y = p + ftJQ + e, E [a | X t ] = 0, V [a I X,] 

Fitted line 

r(x) = ft + ft a: 

Predicted (fitted) values 

% = r{Xi) 

Residuals 

h =Y i -Y i = Y i - (ft + faXt) 



Residual sums of squares (rss) 



RSS(ft,ft) = ^e 2 
»=i 



Least square estimates 

fi T = (ft,ft) T : min RSS 



E 

V 



ft 

p\x n 
p\x n 

se(ft) 



Po = Y n - ftX„ 

Eti(^ - x n)(Yi - Y n ) _ y:u x i y i - nxY 



Er=i x ? - nx2 



ft 



ns x 



~X r 



sxV n 
a 

sx^/n 



where s 2 x — n 1 E™=i(^i — X n ) 2 and ct 2 = E"=i ^1 (unbiased estimate). 
Further properties: 

• Consistency: ft A ft and ft A ft 

• Asymptotic normality: 

§^4AA(0,1) and ^4^4^(0,1) 
se(ft) se(ft) 

• Approximate 1 — a confidence intervals for ft and ft : 

ft ± 2 Q / 2 se(/3 ) and ft ± z a / 2 se0 1 ) 

• Wald test for H : ft = vs. : ft 7^ 0: reject iJ if |W| > z a/2 where 
W = ft/se(ft). 



R 2 



i? 2 = 



=h- = 1 



RSS 



Er=i(^- y ) 2 1 tss 
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Likelihood 

n n n 

£ = ]Jf(X i ,Y i ) =J[fx{Xi) xJlfYixpilXi) = C 1 xC 2 

i—1 i—\ i=l 

n 

c 1 = Hf x (x i ) 

i=i 

£2 = n fy\x(Yi I Xi) cx o- n cxp I - ^2 ]T ( Y * - (A) " /W) > 

i=l I i J 

Under the assumption of Normality, the least squares estimator is also the MLE 

1 " 



el 



18.2 Prediction 

Observe X = of the covariate and want to predict their outcome y . 

y = j3 + Pix* 

V r 

Prediction interval 



y 


= V 






'A 


+ 2x*,Cov 





f 2 = n 2 f ^i=^ Xi Z ^*^ 2 4. I 



18.3 Multiple Regression 

where 



\Xnl 



Xlk\ 
X n kJ 















) 







Likelihood 



£( M ,E) - (27ra 2 )-"/ 2 exp|-^RSs| 

N 

RSS = (y - X(3) T (y -Xfi) = \\Y- X?\\ 2 = ^(y - xf pf 



If the (k x fc) matrix X T X is invertible, 

p = (x T xy 1 x T Y 



V 



/3\X n =a 2 {X T X)- 1 

P*N{f},o 2 {X T X)-i) 

Estimate regression function 



k 

Unbiased estimate for a 2 



n - fc ^ 



»=i 

MLE 

~ v ---i n — k 2 

11 = X g = a 

n 

1 — a Confidence interval 

Pj ± z a/2 se(Pj) 

18.4 Model Selection 

Consider predicting a new observation y* for covariates X* and let S C J 
denote a subset of the covariates in the model, where \S\ = k and \ J\ = n. 
Issues 

• Undcrfitting: too few covariates yields high bias 

• Ovcrfitting: too many covariates yields high variance 
Procedure 

1. Assign a score to each model 

2. Search through all models to find the one with the highest score 
Hypothesis testing 

H :Pj=0 vs. #1 : /3 3 ■ ^ Vj € J 
Mean squared prediction error (mspe) 

mspe = E \(Y(S) -Y*)' 

Prediction risk 

n n 

r(s) = ]T mspe, = £e [(y(s) - y/) : 

i=l i=l 

Training error 



RtAS) = ^(US) - y) 2 



R 2 

r2 = 1 rss(S) = x RUS)_ = l £U(MS)-Yl 

1 ' TSS TSS ' J27=l( Y i - Y ) 2 

The training error is a downward-biased estimate of the prediction risk. 

E \Rtr(S)] < R(S) 

n 

b\as(R tr (Sj) = E \R tr (S)] - R(S) = -2^Cov 



i=l 

Adjusted i? 2 

i? 2 (s) = 1 " ~ :l l!ss 



n - k tss 



Mallow's C p statistic 



R(S) = Rtr{S) + 2ka 2 = lack of fit + complexity penalty 
Akaike Information Criterion (AIC) 

AIC{S)=t n s ,a%)-k 
Bayesian Information Criterion (BIC) 

BIC(S) = l n Sl v 2 s)-\\ogn 

Validation and training 

m 

R V (S) = ^2(Y*(S) -Y*) 2 m= ({validation data}|, often ™ or 

i=l 

Leave-one-out cross-validation 

U(S) = XsiX^Xs^Xs ("hat matrix") 



19 Non- parametric Function Estimation 

19.1 Density Estimation 

Estimate f(x), where f(x) = P [X e A] = J A f(x) dx. 
Integrated square error (ise) 

L(f, fn) = f (/(*) - fn(x)) 2 dx = J(h) + ( f(x) dx 



Frcqucntist risk 



= J b 2 (x) dx + j v{ 



R(f,f n )=E L(fJ n ) = / b 2 (x)dx+ / v(x)dx 



b(x)=E[f n (x)\ -f(x) 
v(x)=v\f n (x) 



19.1.1 Histograms 

Definitions 

• Number of bins m 

• Binwidth h = — 

m 

• Bin Bj has i/j observations 

• Define pj = fj/n and pj — J B f(u) du 

Histogram estimator 

m ^ 
P] 



E 

V 



fn(x) = J2jI(x£Bj) 
3 = 1 
Pj_ 
h 

nh 2 

,2 



fn(x) 

R(fnJ) « ^ J (f(u)f du+± 



nh 

1/3 



h* 



R*(fn,f) 



nl/3 \I(f'(u)f 



du 



C 



C 



2/3 



{f(u)f du 



1/3 



Cross-validation estimate of E [J{h)\ 



19.1.2 Kernel Density Estimator (KDE) 

Kernel K 

• K{x) > 

• / K{x) dx = 1 

• / xK{x) dx = 

• / x 2 K(x) dx = o\> 



fc-nearest Neighbor Estimator 



\x) = ^ y where Nk(x) — {k values of x\, . . . , x n closest to 



i:Xi£N k (x) 

Nadaraya- Watson Kernel Estimator 



KDE 



K ( X - X A 



n * — ' h V h 

i=i 



KfJn) « \{ha K f j{f"{x)fdx+ -L J K 2 (x)dx 

ci = ^ C2 = y ^ 2 (x)^, c 3 - y (r(x)) 2 dx 



-2/5 -1/5 -1/5 
^* _ c l c 2 c 3 



C 4 



n 



4/5 



c 4 



\{„iY'^j K\x) dx ^ (yen 



4/5 / /■ \ 1/5 

2 cfe 



R{r n ,r) 
h* 

R*(r n ,r) 



3=1 K [-TT 
J x 2 K 2 { 3 

I 



x)dx) I \r"(x) + 2r'{x) 



dx 



a 2 jK 2 (x)dx 



' nhf(x) 

Cl 



n l/5 
n 4/5 



C(K) 



Epanechnikov Kernel 



K{x) = I 4^(1-^/5) 

y otherwise 

Cross-validation estimate of E [J{h)\ 

j bv (h) - / j» w * - 1 ± fam *^£± a- + A A . (0) 

i=l i=l j = l x 7 

if* (a;) = if (2) (x) - 2isr (x) K (2) (x) = [ K(x - y)K{y) dy 



Cross-validation estimate of E [J(h)\ 

n n 

Jcv(h) = ^2(Y i -r ( _ i) (x i )f = J2 



{Yi - r{ Xi )f 



i=i 



1 - 



K<fi) 



19.2 Non-parametric Regression 

Estimate f(x) where f(x) = E [Y \ X = x]. Consider pairs of points 
(xi,Yi),..., (x n ,Y n ) related by 

Yi = r(xi) + e 4 
E = 
\t r, i -2 



19.3 Smoothing Using Orthogonal Functions 

Approximation 

oo J 
3=1 »=1 

Multivariate regression 

y = $/3 + 

/<AjOi) 

where % = and $ = 



• 0j(a;i)\ 
\<j) {x n ) ■■■ (t>j(x n )J 



Least squares estimator 

/3 = ($ T $)^ 1 $ T y 



1 



$ T y (for equally spaced observations only) 



Cross-validation estimate of E [J(h)] 

J 



Rcv(J) = J2\ Y <-T,^ x Wu-i) 
»=i V j=i 



20 Stochastic Processes 

Stochastic Process 



{Xf.teT} T = 



{0, ±1, . . . } = Z discrete 
[0, oo ) continuous 



• Notations X t , X(t) 

• State space X 

• Index set T 

20.1 Markov Chains 

Markov chain 

F[X n = x\X ,..., AVi] = P [X n = x | X„_i] Vn eT,x e X 
Transition probabilities 

Pij = P [X„+i = j | X n = i] 

Pij ( n ) = P [ X rn+n = 3 \ X m = l\ n-StCp 

Transition matrix P (n-step: P n ) 

• element is pij 

• > 

• HiPij = 1 

Chapman-Kolmogorov 

Pij(m + n) = ^p tJ (m)p k:j (n) 
k 

Pm+n = PmPn 
P„ = P X • • • X P = P" 

Marginal probability 

= (Mn(l) 5 • • • ! Hn(N)) where m(i) = P [X n = i] 
fio — initial distribution 
fj, n = /i P" 



20.2 Poisson Processes 

Poisson process 

• {X t : t e [0, oo)} = number of events up to and including time t 

• X = 

• Independent increments: 

Vt < • • • < t n : X tl - X t0 1L • • • AL X tn - X tn _ t 

• Intensity function \{t) 

- P [X t+h -X t = l} = X(t)h + o{h) 

- P [X t+h -X t = 2} = o(h) 

• X s+t — X s ~ Po (m(s + t) — m(s)) where m(t) = J* A(s) e?s 
Homogeneous Poisson process 



Waiting times 



A(t) = A => X t - Po (At) A > 



W* := time at which X t occurs 



W t ~ Gamma ^t, ^ 



Interarrival times 



S t = Wt+i - w t 



S t — Exp 



St 



W t . 



w t 



21 Time Series 

Mean function 

/CO 
xf t (x)dx 
-co 

Autocovariance function 

7 x (s, t) = E [(x s - /z s )(ar t - /it)] = E [x s x t ] - fi s lH 

lx {t,t)=^[{x t - /it) 2 ] =V[i ( ] 
Autocorrelation function (ACF) 

_ Cov [x s ,x t ] _ 7(s,f) 

[x s ] V [x t ] y/-y(s,s)j(t,t) 

Cross-covariance function (CCV) 

7xj/(s,i) =E[(a; s -/i Xs )(yt-/ij / J] 
Cross-correlation function (CCF) 



Pxy(s,t) 



lxy(s,t) 



Backshift operator 

B k (x t ) = x t _ fc 

Difference operator 

V d = (l-B) d 

White noise 

• w t ~ to(0,(t^) 

• Gaussian: u> t ~ Af (0, cr^,) 

• e \w t ] =0 (er 



• V [wt] = o 2 teT 



• j w (s,t) = s^t A s,t GT 
Random walk 

• Drift 5 

• X t =5t + W 3 

• E [x t ] = St 

Symmetric moving average 

k k 

m t = a j x t~j where aj — a_j > and a,j 



21.1 Stationary Time Series 

Strictly stationary 

P [X tl <C!,...,X tk < Cfe] = P [x tl +h < Cl, . . .,0;^+/, < c fe ] 



Vfc e N,t k ,c k ,h e Z 

Weakly stationary 

• E [a; 2 ] < oo Vi e Z 

• E [a; 2 ] = m Vi e Z 

• 7x(s,t) = 7x(s + r, i + r) Vr, s,teZ 

Autocovariance function 

• 7 (/i) = E [(x t+h - M) (a* - M)] Vftez 
. 7 (0) = E [(x t - /,) 2 ] 

• 7(0) > 

• 7(0) > |7(MI 

• l{h) = j(-h) 

Autocorrelation function (ACF) 

_ Cov [x t+h ,x t ] _ 7ft + M) _ 7(fe) 

M ~ v/V^+JVH ~ ^ + M + ^b(M) ~ 7(0) 

Jointly stationary time series 

■y xy (h) = E [(x t +h - Hx){yt - My)] 

Pxj/(/l) = 

y/l*(0hy(h) 

Linear process 

OO CO 

x t = ji + ipjWt-j where |^| < oo 

J — -co J — — CO 



21.2 Estimation of Correlation 

Sample mean 



Sample variance 



x = - > x t 
n ' 
t=i 



h=-n v 7 

Sample autocovariance function 

_^ n—h 

= - y2(x t+h - x){ Xt ~ 

Sample autocorrelation function 

p[h) = W) 



Sample cross-variance function 

n—h 

n 



_^ n—h 

%y(h) = - Y^( x t+h - x){y t 



t=i 

Sample cross-correlation function 



Pxy{h) 



% v {h) 



V%(0)7„(0) 
Properties 

• G o (h) = —]= if x t is white noise 

• ^PxvC*) = if x * or 2/* is white noise 

21.3 Non-Stationary Time Series 

Classical decomposition model 



xt — m + st + Wt 



• [it — trend 

• St — seasonal component 

• Wt = random noise term 



21.3.1 Detrending 

Least squares 

1. Choose trend model, e.g., \i t — Po + Pit + P2& 

2. Minimize RSS to obtain trend estimate fit — Po + Pit + P2 1 ? 

3. Residuals = noise w t 

Moving average 

• The low-pass filter v t is a symmetric moving average m t with aj = ^ 

1 k 

i— — k 

• If 2^pj X)i=-fe w t-i ~ 0, a linear trend function fi t — Po + Pit passes 
without distortion 

Differencing 

• Mt = Po + Pit => Vx t = p x 

21.4 ARIMA models 

Autoregressive polynomial 

4>(z) = 1 — <j>iz — ■ ■ ■ — 4> p z p z <E C A 4> p 7^ 
Autoregressive operator 

<f>(B) = l-^B P BP 

Autoregressive model order p, AR (p) 

Xt — 4>i x t-i H V <f>pX t -p + w t 4=^ <f>(B)x t = w t 

AR(1) 

fc— 1 OO 

. = ^(^ fe ) + £ fc-^K 1 ^ ^ (u , t _ .) 

s v ' 

linear process 

2 ih 

• 7O) = Cov [x t +h,x t ] = jz^t 

• P(M = = 

• p(ft) = ^p(ft - 1) ft = 1,2, .. . 



Moving average polynomial 

6(z) = 1 + 6\z H Y6 q z q zeCABg^O 

Moving average operator 

6(B) = 1 + BxB + ■ ■ ■ + 6 p B p 

MA (g) (moving average model order q) 

x t =w t + OiWt-i H h OgWt-q x t = 9{B)w t 

<? 

3=0 



j(h) = Cov [x t+h ,x t ] 

MA(1) 



^E^^j+fc 0<ft<g 
ft > g 



x t = w t + Owt-i 

'{l + e 2 )a 2 w ft = 



1(h) = { 



Ocrl ft = 1 

ft > 1 

ft = 1 



p(ft) = 

I ft > 1 



ARMA (p, q) 



x t = (j)\x t -\ H h </> p Xt-p + w t + 6\w t -\ H h g w t - 

0(B)a;t = 0(B)w t 
Partial autocorrelation function (PACF) 

• x^ 1 = regression of x { on {x h -i,x h -2, ■ ■ ■ ,xi} 

• 4>hh = corr(x h - a^ -1 ,^ - x 1 ^ 1 ) ft > 2 

• E.g., 0ii = corr(x!,x ) = p(V) 

ARIMA (p, d, q) 

V d x t = (1 - B) d x t is ARMA (p, g) 
<t>{B){\-B) d x t = 0(B)w t 
Exponentially Weighted Moving Average (EWMA) 

x t = x t -i +w t - \w t -i 

oo 

x t = y^(l - A)A- 7_1 a; t _ :) - + tu t when |A| < 1 
x„+i = (1 - \)x n + \x n 



Seasonal ARIMA 



• Denoted by ARIMA (p, d, g) x (P, £>, Q) s 

• $ P (B s )0(B)Vf V d a; t = <S + Q Q {B s )6{B)w t 



21.4.1 Causality and Invertibility 

ARMA(p, q) is causal (future-independent) BjV'j} : Y^jLo^j < 00 sucn that 

oo 

x t = tot-j = ip(B)w t 
3=0 

ARMA(p, g) is invertible 3{7Tj} : Ejlo^i < 00 sucn ^ na ^ 

oo 

ir(B)x t = ^2,X t -j = w t 

3=0 

Properties 

• ARMA (p, q) causal -<==^> roots of cf){z) lie outside the unit circle 



3=0 



4>{z) 



\z\ < 1 



ARMA (p, q) invertible -t==^ roots of 9(z) lie outside the unit circle 



3=0 



6{z) 



z < 1 



Behavior of the ACF and PACF for causal and invertible ARMA models 





AR(p) 


MA (g) 


ARMA (p, q) 


ACF 


tails off 


cuts off after lag g 


tails off 


PACF 


cuts off after lag p 


tails off g 


tails off 



21.5 Spectral Analysis 

Periodic process 

x t — Acos{2Ttujt + (j)) 

= U\ cos(27ra;t) + XJi sin(27ru;£) 

• Frequency index to (cycles per unit time), period 1/co 



25 



• Amplitude A 

• Phase 4> 

• U\ = Acos<j) and U2 = As'mcf) often normally distributed rv's 
Periodic mixture 

x t = ^ (U k i cos(27rcj fc i) + U k 2 sin(27rw fe t)) 

k=l 

• Uki, Uk2, for k = 1, . . . , q, are independent zero-mean rv's with variances a\ 

• = ELi CT fe cos(27rw fc /i) 

. t(o)=eK]=EL^ 

Spectral representation of a periodic process 

7(/i) = a 2 cos(2-7ru;o^) 

2 2 

_ ® —2Tcicooh _|_ ^ 2TriujQh 

~ 2 2 



Discrete Fourier Transform (DFT) 

n 

d( Wj ) = n- 1 / 2 ^ a;t e- 2 ^ t 
»=i 

Fourier/Fundamental frequencies 



= i/n 



,1/2 

= / 

J -1/2 



dF(co) 



Spectral distribution function 



cj < — oj 
= { a 2 /2 -uj < uj < uj 
a 2 lo > Wq 



• F(-oo) =F(-l/2) =0 
. F(oo) - F(l/2) - 7 (0) 

Spectral density 



27ritah 



- - < UJ < 



• Needs EZ-00 I7WI < 00 =► 7 (/i) = j^ 2 /2 e 2 ^ "/ (w) dc h = 0, ±1, . . . 

• /M > 

. f(oj) = f(-oj) 
. /(w) = /(l - w) 

• White noise: f w (uj) = o 2 ^ 

• ARM A (p, g) , <j){B)x t = 6(B)w t : 



tx\u) = a. 



2niuj \ 1 2 
2 



|0( e -2^)|S 

where 0(z) = 1 - <fez fe and 0(z) = 1 + £« =1 0kZ k 



Inverse DFT 

Periodogram 
Scaled Periodogram 



n-l 



x t = n 



-1/2 



j=o 

I(j/n) = \d(j/n)\ 2 



P(j/n) = -I(j/n) 



n 



2 \ / 2 

— ^ ir t cos(2irtj/n J + I — ^ x t sin(27rfj/ r 



22 Math 

22.1 Gamma Function 

/>oo 

• Ordinary: T(s) = / t^e^dt 

Jo 

/>oo 

• Upper incomplete: r(s,x) = / t s ~ 1 e~ t dt 



• Lower incomplete: 7(5, x) = / t s ~ 1 t 

Jo 

• r(a + 1) = aiT (a) a > 1 

• r(n) = (n-l)! neN 
. T(l/2) = ^ 



l dt 



22.2 Beta Function 

. Ordinary: B(x, y) = B(y, x) - f t x ~\l - t)^ 1 dt = ffi^ 

• Incomplete: B(x;a,6) = / i a_1 (l - t) 6_1 

Jo 

• Regularized incomplete: 

/x(a , 6) = feM a v 1 + ^ (i - -) a+b - 1 

V j B(a,b) f^ a j\(a + b-l-j)\ 1 ; 



• I {a,b) = h(a,b) = 1 

• I x (a, b) = 1 - 7i_ x (6,a) 



22.3 Series 

Finite 

. y k= n ( n + 1 ) 

^ 2 

fc=i 

n 

• ^(2fc-l) = n 2 

fc=i 

% ^ fc2= n(n+l)(2n + I) 



Binomial 

n 



fe=0 



E 

n 

E 



n 
k 

r + k 
k 



r + n + 1 
n 



fc=i 

n 

•E fc3 

fe=i 



n + 1 

TO I V TO + 1 



n(n + 1) 



E cfe = 



„n+l 



• 2^ C 

fe=0 



c- 1 



c^l 



k=0 

• Vandermonde's Identity: 

x / to\ / n \ fm + n 
t^ \k)\r-kj 

• Binomial Theorem: 

fc=0 ^ ' 



Infinite 



oo 1 oo 

E^ = T^' E^ = t^- i^k 1 

to l - p ti l - p 



k=0 

oo 



^ kpk l ^ (J^/J -dp{i- P )-{i - P r 



k=0 

oo 



H <i 



• E0P k = (i+p) a \p\<l,a€C 



22.4 Combinatorics 

Sampling 



A: out of n 


w/o replacement 


w/ replacement 


ordered 
unordered 


"*-n(»-o -(„.*,, 

/ n\ n- n! 
\k) ~ k\ ~ k\(n-k)\ 


/n — 1 + r\ in — 1 + r\ 
V r J = I n-1 J 



Stirling numbers, 2 kind 

CMTM::;} {:}-{; r 

Partitions 

n 

P n +k,k = E ^™>* k> n : P„ ife = n > 1 : P„ j0 = 0, P ,o = 1 

»=i 

Balls and Urns / : B — > {/ D = distinguishable, -■£) = indistinguishable. 



|B| = n , |Z7| = to 


/ arbitrary 


/ injective 


/ surjective 


/ bijective 


B : L>, [7 : £> 


TO™ 


{ 


to- to > n 
else 


4:) 




( 


n! m = n 
else 


B : ->D, U : D 


/n + n — 1\ 
V « / 


/ to\ 
W 


/n- 1 N 


) 




1 m = n 
else 


B : D, U : -.£> 


m 




J 1 m > n 
\0 else 


{1} 




1 TO = 71 

else 


B :-.£>, C/ : -.£> 


m 

E p «> fc 
*;=! 




J 1 m >n 
[0 else 


P 

J n.m 


< 


1 m = n 
else 
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